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Abstract 

Late-onset Alzheimer's disease (LOAD) is a multifactorial disorder with over twenty loci associated with disease risk. Given 
the number of genome-wide significant variants that fall outside of coding regions, it is possible that some of these variants 
alter some function of gene expression rather than tagging coding variants that alter protein structure and/or function. 
RegulomeDB is a database that annotates regulatory functions of genetic variants. In this study, we utilized RegulomeDB to 
investigate potential regulatory functions of lead single nucleotide polymorphisms (SNPs) identified in five genome-wide 
association studies (GWAS) of risk and age-at onset (AAO) of LOAD, as well as SNPs in LD (r^O.SO) with the lead GWAS 
SNPs. Of a total 614 SNPs examined, 394 returned RegulomeDB scores of 1-6. Of those 394 variants, 34 showed strong 
evidence of regulatory function (RegulomeDB score <3), and only 3 of them were genome-wide significant SNPs {ZCWPW1/ 
rs1476679, CLU/rs 1532278 and /\SC/V/rs3764650). This study further supports the assumption that some of the non-coding 
GWAS SNPs are true associations rather than tagged associations and demonstrates the application of RegulomeDB to 
GWAS data. 
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Introduction 

Over 1200 genome-wide association studies (GWAS) have been 
published since 2005 [1]. While some of these studies have been 
crucial for determining genes responsible for disease phenotypes, 
including determination of genes involved in inflammatory bowel 
disease and age-related macular degeneration, the majority of 
variants identified show modest effect size at best. Furthermore, 
88% of significant variants are located in either intronic or 
intergenic regions that do not encode proteins, suggesting their 
association with disease may occur for reasons other than changes 
in protein structure and/ or function [2] . 

Given these findings, researchers recendy have begun to 
deliberate implications of these non-coding variants. One such 
consideration is the possibility that, splice site variants and 
promoters aside, introns and intergenic regions are not "junk 
DNA" as previously believed, but possess regulatory properties 
which modify gene expression. Indeed, only 2% of the human 
genome encodes proteins, the remaining 98% is not "functional" 
in the sense that it does not encode proteins. Rather, the bulk of 
the genome is comprised of repeat regions, introns, and 
transposons [3]. Multiple molecular techniques have been 
employed to determine chromatin structure, methylation, and 
protein motifs and binding to assess the effect of non-coding 
variants on transcription [4]. RegulomeDB is a database 
developed to capture these data, and subsequently, assess the 
likelihood that a particular variant affects transcription factor 



binding. The advent of such databases is advantageous for 
studying gene associations of complex diseases [2] . 

Late-onset Alzheimer's disease (LOAD) is one such disease that 
may be better understood by examining the regulatory function of 
associated SNPs. Thus far, genome-wide association studies 
(GWAS) of LOAD have identified over 20 significandy associated 
risk loci [5-7] . In addition, several suggestive loci for risk and age- 
at-onset (AAO) of AD have also been implicated [8] , [9] . Of these 
loci, only one, APOE, shows a strong effect size, which substantially 
increases risk for individuals homozygous for the APOE*4 allele 
especially after age 75 [10], [11]- The remaining loci have only 
weak to modest effect sizes. In this study, we have demonstrated 
the utility of two publicly available bioinformatics tools, Broad 
Institute's SNP Annotation and Proxy search (SNAP) tool (http:// 
www.broadinstitute.org/mpg/snap/) [12] and RegulomeDB 
(http://regulomedb.org) [2], to investigate potential regulatory 
functions of recendy identified, non-APOE variants (index and 
proxy SNPs) for known and suggestive loci associated with risk and 
AAO of LOAD. 

Methods 

SNP selection 

We selected a total of 44 genome-wide significant or suggestive 
single-nucleotide polymorphisms (SNPs) reported for risk or AAO 
of AD (see Table SI). Included among these SNPs were the 28 
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genome-wide significant SNPs from 2 1 non-APOE LOAD risk loci 
{PICALM, BLN1, CD33, CD2AP, MS4A4A/MS4A6E, ABCA7, 
EPHA1, CLU, CR1, HLA-DRB5/ HLA-DRB1, PTK2B, S0RL1, 
SLC24A4/PJN3, DSG2, LNPP5D, MEF2C, NME8, ZCWPW1, 
CELF1, FERMT2, and CASS4) [5-7] and 16 SNPs from novel 
suggestive loci identified in two GWAS of risk and AAO of LOAD 
{DCHS, HRK/RNFT2, ADAMTS9, KCNV2 1 VLDLR, LEMD2I 
MLN/MIR1275, LOC390958/SecllC, ZNF592/ALPK3/SLC28A1, 
PSMD1 / HTR2B/ARMC9, NRXN3, PPP1R3B, MMP3/MMP12, 
FLJ37543, PCDH7, LOC440390, MAPRE1P2 pseudogene, and 
PP1R2P5 pseudogene) [8], [9]. IRB approval and informed 
consent procedures were outlined in each of the publications from 
which SNPs were selected [5-9]. 

Linkage Disequilibrium 

Following SNP selection, we utilized the SNAP web portal 
[accessed 4 September 2013] [12] to identify SNPs in linkage 
disequilibrium (LD) (/>0.80) with our SNPs of interest. SNAP 
allows users to find proxy SNPs based upon LD determined using 
the CEU populations from the International HapMap (v3) or 1000 
Genomes Pilot 1 projects. SNAP searches were not limited by 
array and the identified SNPs could include the queried SNPs as 
proxies for themselves. At / 2 ^0.80, the SNAP portal found 570 
SNPs in LD with the 44 GWAS SNPs. SNAP proxy searches were 
repeated with r thresholds of 0.90 and 1.0 to better assess 
associations among related SNPs. These higher thresholds yielded 
a total of 472 and 191 identified SNPs, respectively. As expected, 
the number of identified SNPs in LD with the 44 published SNPs 
decreased as the r 2 threshold increased. Table 1 summarizes the 
total number of SNPs in LD at all three thresholds for both 
HapMap3 and 1000 Genomes searches. All published SNPs and 
their respective proxy SNPs for each r threshold are listed in 
Table SI. 

RegulomeDB 

RegulomeDB is a database providing functional annotation of 
SNPs as determined by data from the ENCODE Project 
Consortium (2012), NCBI Sequence Read Archive, and other 
sources totaling 962 data sets. It is free and publicly accessible 
(http://www.regulomedb.org) and has a straight-forward inter- 
face. With almost 60 million annotations, this tool will be 
invaluable for future examination of gene expression and disease 
traits. Variants can be classified into one of four RegulomeDB 
categories with scores ranging from 1 to 6 indicating putative 
functions. Scores and corresponding functional evidence are listed 
in Table 2. All reported SNPs and SNPs in LD (using the />0.80 
list) were examined for potential regulatory functions using 
RegulomeDB (http:/ /regulomedb.org, accessed [4 September 
2013]) [2]. 

Table 1. Number of SNPs in linkage disequilibrium for all 
published GWAS SNPs for HapMap3 and 1000 Genomes 
populations at tested r 2 thresholds. 



Linkage Disequilibrium {r 2 ) Threshold 





0.80 


0.90 


1.0 


1000 Genomes 


612 


466 


189 


Hap Map 3 


122 


85 


62 


TOTAL (overlaps removed) 


614 


472 


191 



doi:10.1371/journal.pone.0095152.t001 



Results 

Of the 614 SNPs examined in RegulomeDB, 220 had 
RegulomeDB scores of "No Data", and the remaining 394 
returned scores of 1-6. Of those 394 variants, 34 had a 
RegulomeDB score of less than 3 (Table 3 and Table S2), 
indicating a relatively high degree of evidence for potential 
regulatory function ("likely to affect binding"). Interestingly only 3 
of these 34 SNPs were the reported genome-wide significant SNPs 
(<WPI47/rsl476679, score = If; /b3C47/rs3764650, score = 2b; 
and £Zt//rsl532278, score = 2b), one was a reported suggestive 
SNP (HRK/RNFT2/rs\im2\l, score = 2b), and the remaining 
30 were in LD (/>0.80) with the 44 lead SNPs reported in LOAD 
GWA studies [5-9]. Table 4 summarizes LD between regulatory 
SNPs with RegulomeDB score of <3 and published GWAS SNPs. 
Only one of the 34 SNPs had a score of lb, while 18 had a score of 
If, 4 returned a score of 2a, and 1 1 a score of 2b. 

A total of 1 0 confirmed loci and 3 suggestive loci harbored SNPs 
with a RegulomeDB score <3. The SNP with the most evidence 
for regulatory function was rs667897 with a RegulomeDB score of 
lb. This SNP is an intergenic SNP located in the MS4A region, 
just downstream of MS4A6A. Nine other SNPs in the MS4A region 
(of the 157 SNPs tested in this region) also had scores of less than 3, 
as well as 20 SNPs in 9 other confirmed LOAD risk loci: 
ZCWPW1 (1 of 8 SNPs tested), CLU [2 of 10 SNPs tested), ABCA7 
(2 of 7 SNPs tested), CELF1 (8 of 25 SNPs tested), PTK2B (2 of 6 
SNPs tested), CASS4 (1 of 11 SNPs tested), PICALM {2 of 93 SNPs 
tested), CD2AP (1 of 69 SNPs tested), and BIM (1 of 6 SNPs 
tested). Remarkably, eight SNPs in the CELF1 gene region on 
chromosome 11 (SLC39A13/rs2293576, C£ZF7/rs7933019, 
jVDt/i«5/rs2280231, MTCH2/ml 120548, MJP160/rs7 114011, 
CELFl/rs 11039290, C£LF//rs6485758, and RAPSN/kI 103835) 
with scores of If are in LD with the genome-wide significant 
C£ZF7/rsl0838725 SNP which by itself is not functional 
according to RegulomeDB (score = 6). All eight are eQTLs for 
C1QTNF4, and three of them {SLC39A1 3 /rs229357 6, MTCH2I 
rs7120548, and NUP160/k1\ 1401 1) also affect expression of 
MYBPC3 and SPI1. Three other suggestive novel loci, ADAMTS9, 
ZNF592/ALPK3/SLC28A1, and HRK/RNFT2, also had variants 
with strong evidence for regulatory function with scores of 2b (1 of 
6 SNPs tested), If (2 of 7 SNPs tested), and 2b (1 of 1 SNP tested), 
respectively. 

Of the 30 SNPs that were in LD with reported genome-wide 
significant variants and had high evidence of regulatory function, 
10 were located in the MS4A region, including the SNP with the 
most evidence for regulatory function, rs667897 (RegulomeDB 
score = lb). RegulomeDB cites rs667897 affects binding of 21 
different proteins including BRCA1, SMARCC2, FOXA1, JUN, 
and POLR2A and falls within both TCFlhMafG and NFE2L2 
binding motifs. Six other SNPs in the MS4A region, including 5 
intergenic (rsl303615, rs617135, rsl 1230180, rs2123314, and 
rs655231) and 1 intronic (MS^£/rs2081547) SNPs, had 
RegulomeDB scores of If, and similar to the top hit rs667897, 
all are eQTLs for MS4A4A as evidenced by work in monocytes. 
Some of the protein binding affected by these SNPs include 
CEBPB, JUN, JUND, POLR2A, and SMARCC2. These are the 
same proteins that are also affected by top MS4A region hit, 
rs667897, however, motifs containing these variants have yet to be 
determined. Three more SNPs in the MS4A region, rs636317, 
rs636341, and rs7933202 were likely to affect binding according to 
RegulomeDB (score = 2a, 2a, and 2b, respectively). 

The reported ^GWW/rs 1476679 SNP (RegulomeDB 
score = If) is an eQTL for GATS, PILRB, and TRIM4, and similar 
to other functional variants in our dataset, affects binding of 
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Table 2. RegulomeDB category summaries [2]. 



Category Description 

Likely to affect binding and linked to expression of a gene target 

la eQTL + TF binding + matched TF motif + matched DNase footprint + DNase peak 

1 b eQTL + TF binding + any motif + DNase footprint + DNase peak 

1c eQTL + TF binding + matched TF motif + DNase peak 

Id eQTL + TF binding + any motif + DNase peak 

1e eQTL + TF binding + matched TF motif 

If eQTL + TF binding/DNase peak 

Likely to affect binding 

2a TF binding + matched TF motif + matched DNase footprint + DNase peak 

2b TF binding + any motif + DNase footprint + DNase peak 

2c TF binding + matched TF motif + DNase peak 

Less likely to affect binding 

3a TF binding + any motif + DNase peak 

3b TF binding + matched TF motif 

Minimal binding evidence 

4 TF binding + DNase peak 

5 TF binding or DNase peak 

6 Motif hit 



doi:10.1371/journal.pone.0095152.t002 

RFX3, as weU as CTCF. Two intronic CXt/SNPs, rs 1532278 and 
rs867230, show evidence of regulatory function (RegulomeDB 
score = 2b, each). The genome-wide significant Cit7/rsl532278 
SNP, located in intron 3 of CLU affects binding of NANOG, 
TAF1, USF1, MAX, USF2, and GATA2 and is situated in the 
Nkx2-5 binding motif. The second CZt//rs867230 variant, located 
in the first intron of CLU, affects binding of GATA1 and GABPA 
and alters the MEF-2 and Zfp740 motifs. 

A genome-wide significant ABCA7 7 'rs37 '64650 SNP (Regulo- 
meDB score = 2a), has indications for binding six different proteins 
(SP1, HNF4A, HNF4G, BHLHE40, USF1, and USF2) and is 
located in the binding motifs for TBX22, TBX18, TBX15, 
HNF4alphal, COUPTF, HNF4, COUP-TF = HNF-4, NR2F1. 
Another ABCA7/M 14791 1 SNP (RegulomeDB score = 2b) affects 
binding of IKZF1. 

An eQTL for both DPYSL2 and PTK2B, rs 17057043 (Reg- 
ulomeDB score = If) is located in intron 5 of PTK2B, which is part 
of the N-Myc and RBP-Jkappa binding motifs, and affects binding 
of IRF1. PrA2^/rs73223431 SNP, also located in intron 5 of 
PTK2B, has a score of 2b. Similar to ABC47/rs3764650 
(RegulomeDB score = 2b), P77L2.B/rs73223431 falls in binding 
motifs of TBX22, TBX15, and TBX18, among others. 

Discussion 

As the list of associated LOAD risk loci continues to grow, it 
becomes increasingly important to decipher the biological 
underpinnings of these associations. If we accept that these 
associations are real, we must endeavor to explain them. Since 
many of these risk variants are non-coding, one logical explanation 
for their association is an effect on gene expression. The 
ENCODE project has provided invaluable contributions to this 
area of research with a wealth of data that is publicly available for 
interpretation and expansion. These data are ideal for generating 



hypotheses and furthering our understanding of gene expression 
and epistasis. Here we have used two publicly available 
bioinformatics tools, SNAP tool and RegulomeDB, to investigate 
potential regulatory functions of non-APOE SNPs implicated with 
risk and AAO of LOAD. 

Of the 2 1 non-APOE genome-wide significant risk loci, ten — 
ZCWPW1, CLU, ABCA7, MS4A4A/MS4A6E, PLCALM, CD2AP, 
BWl, CELF1, CASS4, and PTK2B — had SNPs with functional 
evidence. Of the 16 suggestive novel loci, three— HRK/RNFT2, 
ADAMTS9, and ^(F592/ALPK3/SLC28A1- had SNPs with 
functional evidence. Importandy, only three of the 34 SNPs with 
evidence for potential regulatory function based on RegulomeDB 
score were the reported genome-wide significant SNPs 
[£CMW7/rsl476679 (score = If), CZ£//rsl532278 (score = 2b), 
and ASC47/rs3764650 (score = 2b)] and one was a reported 
suggestive SNP [HRK/RNET2/rs 1 74292 1 7 (score = 2b)] . All three 
reported genome-wide significant SNPs are intronic and our 
findings suggest that they, rather than the SNPs in LD with them, 
are causative for LOAD risk via a regulatory mechanism. 

None of the ten MS4A region SNPs with a score of <3 are 
reported GWAS SNPs, indicating the difficulty of differentiating 
between a true signal and a tag signal in association studies, as well 
as highlighting the complexity of interactions between genetic 
variants and disease risk. Of the remaining 24 putative regulatory 
variants representing 12 loci other than the MS4A region, we 
observe some thought-provoking outcomes. For example, synon- 
ymous variant SLC39A1 3 /rs229357 6 (in LD with CELFlf 
rs 10838725, r 2 >0.8) is unique in this dataset because it is the 
only SNP with regulatory evidence that resides in an exon, 
reminding us that regulatory elements can be found within coding 
sequences as well as in intergenic regions and introns. Further- 
more, eight putative regulatory variants located in six different 
genes (including the synonymous SLC39A1 3 /rs229357 6 variant) 
are in LD with the reported C£ZF7/rsl0838725 SNP, and all are 
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Table 3. Details of study SNPS with putative regulatory function (RegulomeDB Score <3). 



Coordinate 


dbSNP ID 


RegulomeDB 


Gene/Locus 


Position* 


eQTL 


Motifs 


Protein 


(O-based) 




Score 


(per dbSNP) 








Binding 


chrll 59936978 


rs667897 


lb 


MS4A region 


intergenic 


MS4A4A 


TCF1 1:MafG 


SMARCC2 










downstream MS4A6 




NFE2L2 


STAT1 










downstream MS4A2 






JUN 



chrl 1:47461 692 
chrl 1:47530023 
chrl 1:47572278 

chrl 1:47434985 

Coordinate 
(O-based) 

chrl 1:47509 136 
chrl 1:47600437 



rs71 03835 1f 
rs6485758 If 
rs 11039290 1f 



RAPSN 

CELF1 

CELF1 

SLC39A13 



dbSNP ID RegulomeDB Gene/Locus 
Score (per dbSNP) 



rs7933019 
rs2280231 



1f 

If 



CELF1 
NDUFS3 



intron 4 
intron 1 
intron 1 



Position 



intron 2 
5' UTR 



C1QTNF4 
C1QTNF4 
C1QTNF4 

C1QTNF4 

MYBPC3 

eQTL 

SPI1 

C1QTNF4 
C1QTNF4 



Glis2 
Mtfl 



Motifs 



MXI1 

YY1 

CEBPB 

GTF2F1 

FOXA1 

FOS 

USF1 

STAT3 

BRCA1 

EP300 

POLR2A 

ELK4 

PRDM1 

RFX5 

GATA2 

TRIM28 

SETDB1 

JUND 



RAD21 

Protein 
Binding 



BCLAF1 

CHD2 

CREBBP 

CTBP2 

E2F1 

E2F4 

EFL1 

ELK4 

EP300 

ERG 

ETS1 

EWSR1 

FLU 

GABPA 

GATA1 

GTF2F1 

HNF4A 

IRF1 
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IRF4 

JUNB 

JUND 

MYC 

NFKB1 

NR2C2 

PAX5 

P0LR2A 

SIX5 

Coordinate dbSNP ID RegulomeDB Gene/Locus Position* eQTL Motifs Protein 

(O-based) Score (per dbSNP) Binding 

SMARCB1 

SMARCC1 

SP1 

STAT1 

TAF1 

TBP 

TCF4 

TRIM28 

USF1 

WRNIP1 

ZBTB7A 

ZEB1 



chrl 1:47662931 


rs71 20548 


If 


MTCH2 


intron 1 


C1QTNF4 






MYBPC3 


SPI1 


chrl 1:4781 1308 


rs71 14011 


If 


NUP160 


intron 29 


C1QTNF4 


AP-3 














MY8PC3 


Oct-1 














SPI1 


lrx-3 




chr7:1 00004445 


rsl 476679 


If 


ZCWPW1 


intron 1 1 


GATS 




RFX3 












PILRB 




CTCF 


TRIM4 


chr8:27220309 


rsl 7057043 


If 


PTK2B 


intron 5 


DPYSL2 


N-Myc 


IRF1 












PTK2B 


RBP-Jkappa 




chrl 1:598851 19 


rsl 30361 5 


If 


MS4A region 


intergenic 


MS4A4A 






downstream of MS4A2 


downstream MS4A6A 


chrl 1:59936756 


rs617135 


If 


MS4A region 


intergenic 


MS4A4A 




POLR2A 










downstream MS4A6A 






MAX 


Coordinate 


dbSNP ID 


RegulomeDB 


Gene/Locus 


Position* 


eQTL 


Motifs 


Protein 


(O-based) 




Score 


(per dbSNP) 








Binding 


SMARCC2 


chrl 1:59961485 


rsl 12301 80 


If 


MS4A region 


intergenic 


MS4A4A 




JUNB 










upstream MS4A6A 






NKB1 


downstream MS4A4E 


chrl 1:59966294 


rs2123314 


If 


MS4A region 


intergenic 


MS4A4A 






upstream MS4A6A 


downstream MS4A4E 


chrl 1:59989429 


rs2081547 


If 


MS4A4E 


intron 2 


MS4A4A 




CEBPB 


JUN 


JUND 


chrl 1:6001 3856 


rs655231 


If 


MS4A region 


intergenic 


MS4A4A 




RFX3 



upstream MS4A4E 
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upstream MS4A4A 



chr! 5:85425096 


rsl 291 7429 


If 


SLC28A1 


intergenic 


NMB 






upstream SLC28A 1 


chrl 5:85429355 


rs 12909280 


If 


SLC28A / 


intron 1 


NMB 




RFX3 


chrl 1:60019149 


rs636317 


2a 


MS4A region 


intergenic 




TALI 


CTCF 










upstream MS4A4E 




CTCF 


RAD21 










upstream MS4A4A 






FOXA1 



SMC3 

BCLAF1 

YY1 



POU2F2 
ZNF143 



chrl 1:60019160 


rs636341 


2a 


MS4A region 


intergenic 




CTCF 


CTCF 










upstream MS4A4E 




ST ATI :STAT1 


RAD21 










upstream MS4A4A 




C/EBPbeta 


FOXA1 


SMC3 


Coordinate 


dbSNP ID 


RegulomeDB 


Gene/Locus 


Position* 


eQTL 


Motifs 


Protein 


(O-based) 




Score 


(per dbSNP) 








Binding 


BCLAF1 


YY1 


POU2F2 


ZNF143 


chrl 1:8581 5029 


rsl 237999 


2a 


PICALM region 


intergenic 




AP-1 


JUN 










upstream PICALM 




Jundm2 


JUNB 


JUND 


FOS 


chrl 9:1 0465 19 


rs3764650 


2a 


ABCA7 


intron 13 




TBX22 


SP1 














TBX18 


HNF4A 














TBX15 


HNF4G 














HNF4alpha1 


BHLHE40 














COUPTF 


USF1 














HNF4 


USF2 


COUP-TF = HNF-4 


NR2F1 


chrl 2:1 17295332 


rsl 74292 17 


2b 


HRK/RNFT2 region 


intergenic 




HNF4 = COUP 


EBF1 










downstream RNFT 




Hnf4a 




downstream HRK 


chr6:47447040 


rs4715019 


2b 


CD2AP 


intron 1 




lrx-3 


POLR2A 



Sox 15 

HoxB5 

Zfp105 

Hoxa3 

Dlx1 

Hoxb8 

Irx6 

Hoxa6 



Coordinate 


dbSNP ID 


RegulomeDB 


Gene/Locus 


Position* 


eQTL 


Motifs 


Protein 


(O-based) 




Score 


(per dbSNP) 








Binding 


Hoxb6 


Hoxb5 


chr8:27466314 


rsl 532278 


2b 


CLU 


intron 3 




Nkx2-5 


NANOG 



TAF1 
USF1 
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MAX 
USF2 
GATA2 



chrl 1:59936925 


rs7933202 


2b 


MS4A region 


intergenic 




DMRT3 


POLR2A 










downstream MS4A6A 






SMARCC2 



STAT1 

JUN 

YY1 

CEBPB 

GTF2F1 

FOS 

USF1 

STAT3 

EP300 

ELK4 

MXI1 

FOXA1 

BRCA1 

PRDM1 

GATA2 

TRIM28 

SETDB1 

JUND 



chrl 1:8581 1237 


rs542126 2b 


PICALM region 


intergenic 




E47 


HNF4A 


Coordinate 


dbSNP ID RegulomeDB 


Gene/Locus 


Position* 


eQTL 


Motifs 


Protein 


(O-based) 


Score 


(per dbSNP) 








Binding 








upstream PICALM 






RXRA 


POLR3A 


USF2 


USF1 


chrl 9:1 047686 


rs41 47911 2b 


ABCA7 


intron 16 




HEN1 


IKZF1 


chr20:54997567 


rs6024870 2b 


CASS4 


intron 2 




Pax-3 


TCF4 


CTCF 


FOS 


RAD21 


chr2:1 27888336 


rsl 1689287 2b 


BIN1 region 


intergenic 




FOXL1 


CTCF 








upstream BIN! 




Oct-1 




Six6 


FOXP1 


Tbp 


chr3:64918621 


rs812651 2b 


ADAMTS9-AS2 


intron 4 




RBP-Jkappa 


HNF4A 


SETDB1 


chr8:27219986 


rs73223431 2b 


PTK2B 


intron 5 




TRUE 


MYC 












TBX15 


GATA1 












TBX18 


CDX2 












TBX22 


POLR2A 












T 


JUN 












Brachyury 


NKFB1 


HNF4A 


GATA2 


chr8:27468502 


rs867230 2b 


CLU 


intron 1 




MEF-2 


GATA1 












Zfp740 


GABPA 


Bolded SNPs are 


published GWAS SNPs. 













*Upstream/downstream designation based upon gene direction per NCBI. 
doi:10.1371/joumal.pone.0095152.t003 
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Table 4. Linkage disequilibrium for published GWAS SNPs with functional proxies (RegulomeDB score <3) according to SNAP 
search. 



GWAS SNP 



Functional Proxy SNP 



RegulomeDB Score 



/\SC47/rs3764650 

HRK/RNFT2/rs 1 74292 1 7 

ADAMTS9/rs704454 

C7\SS4/rs7274581 

ZOWW!/rs1476679 

CD2,4P/rs9296559 

CD2,4P/rs9349407 

CRF//rs1 0838725 



PrK2S/rs28834970 



CZU/rsI 532278 



PO/JW/rs561655 



ZNF592/ALPK3/SLC28A //rs3743 1 62 



fi/N//rs7561528 



/1BC47/rs3764650*** 

4fiC717/r54147911** 
HRK/RNFT2lrs-\ 742921 7*** 
ADAMTS9-AS2/rs8-\ 2651 ** 
C7\SS4/rs6024870* 
ZCWPWIIrsA 476679*** 
CD2/lP/rs4715019*** 
CD2,4P/rs4715019*** 
SLC39A1 J/rs2293576* 
CF/.F7/rs7933019** 
NDUFS3/rs2280231** 
MTCH2I rsl 120548* 
NUP 1 601 rs7 114011** 
CF/.F//rs1 1039290** 
CFZ.F7/rs6485758** 
R,4PSN/rs71 03835* 
PTK2B/rs 17057043* 
P7K2S/rs73223431* 
CiW/rsI 532278*** 
CWrs867230* 
P/C/l/./W/rs1 237999* 
P/C71(./W/rs542126* 
S/.C28^!/rs1 291 7429*** 
S/.C2S^//rs1 2909280*** 
fi/N!/rs1 1689287** 



2a 
2b 
2b 
2b 
2b 
If 
2b 
2b 



2b 
2b 
2b 
2a 
2b 
If 
If 
2b 



GWAS SNP 



Functional Proxy SNP 



RegulomeDB Score 



MS4/14/l/rs4938933 



MS4/16/Vrs610932 



MS4A region/rsl 1230180* 
MS4A region/rs2123314* 
MS4/WE/rs2081547* 
MS4/lregion/rs655231** 
MS4A region/rs636341** 
MS4A region/rs636317** 
MS4A region/rs7933202* 
MS4A region/rs7933202* 
MS4A region/rs667897* 
MS4A region/rsl 30361 5* 
MS4A region/rs617135* 



If 
If 
If 
If 
2a 
2a 
2b 
2b 
lb 
If 
If 



Linkage disequilibrium (r 2 ) values are indicated as — *>0.80, 
Bolded SNPs are GWAS SNPs with regulatory function. 
doi:10.1371/joumal.pone.0095152.t004 



5:0.90, and 



eQTLs for C1QTMF4. These results suggest future work should 
examine C1QTMF4 (aka CTRP4) as a potential player in LOAD 
risk in addition to currently implicated CELF1 gene. C1QTNF4 is 
an inflammatory cytokine capable of activating both Stat3/IL6 
and NF-kB pathways, as shown in cancer cells [13]. The 
implication of the inflammatory pathway in AD pathogenesis 
and the inverse association between AD and cancer may explain in 
part the observed relationship between these SNPs and their effect 
on C1QTMF4 expression [14], [15]. 



According to RegulomeDB, the binding of the IKAROS family 
zinc finger 1 (Ikaros) transcription factor, IKZF1, is affected by 
ABC47/rs414791 1 (score = 2b). It is worth noting that the 
expression of another LOAD risk gene, LNPP5D, is regulated by 
the Ikaros transcription factor family in B cells [16], suggesting a 
potential functional link between ABCA7 and IMPP5D. Similarly, 
RegulomeDB findings suggest other proteins whose binding seems 
to be affected by variants at different LOAD loci (Table 3). 
Another position of interest is intron 5 of PTK2B. Two variants in 
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this intron had RegulomeDB scores less than 3 (rs 17057043, 
score = If and rs73223431, score = 2b), suggesting that intron 5 of 
PTK2B may play an important role in affecting the binding of 
regulatory proteins and consequently the risk of LOAD. 

Variants in reported suggestive novel loci for AAO of AD, 
ZNF592/ALPK3/SLC28A1, HRK/RNFT2, and ADAMTS9, are 
also of functional importance as reflected by RegulomeDB scores 
of If, 2b, and 2b, respectively. Both rs 129 17429 and rs 12909280 
in the SLC28A1 region are eQTLs for neuromedin B (MMB), with the 
latter SNP suggested to affect binding of RFX3. According to 
GeneCards [1 7] NMB is a ligand that binds to bombesin receptors 
to instigate smooth muscle contractions. The bombesin peptides 
and receptors have been implicated in a variety of cellular 
processes and are frequendy overexpressed in cancer cells [18], 
[19]. RFX3 has been shown to be responsible for proper Corpus 
Callosum development in mice [20]. RFX3 also affects expression 
of glucokinase and subsequently affects differentiation and 
function of beta cells [21]. Two other SNPs with RegulomeDB 
scores of If, £CI4 / PI47/rsl476679 and rs655231 (MS4A region), 
show indications for affecting binding of RFX3 in K562 (chronic 
myelogenous leukemia, CML) cells. Given the proposed link 
between insulin resistance and AD as a result of insulin degrading 
enzyme (IDE), RFX3 may be an interesting transcription factor to 
examine in the context of LOAD pathogenesis [22]. 

Although RegulomeDB is an extensive database for the 
annotation of variants' effects on gene expression, it provides 
information for only selected DNA binding elements in certain cell 
types. A total of 220 variants of the 614 we examined returned 
scores of "No data," meaning we cannot argue against their 
involvement in gene expression as related to LOAD pathogenesis. 
Along the same lines, some loci have a markedly higher number of 
SNPs that have been tested for expression effects than others. 
Thus, we make no assumptions that the mere number of putative 
regulatory variants for a given locus is indicative of the magnitude 
of that locus' role in risk and disease process. Moreover, the 
primary focus of our study was RegulomeDB and prediction of 
regulatory effects on gene expression based on the data included in 
that database. Therefore some other regulatory mechanisms, such 
as regulation of RNA splicing, or prediction of changes in protein 
structure and/ or function were not covered as part of this study. 
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